i mi mil ii mill mil mil inn mil mil mi mil inn inn mi mill mi mi mi 

US 20030027135A1 

(19) United States 

(12) Patent Application Publication (io> Pub. No.: US 2003/0027135 Al 

Ecker et al. (43) Pub. Date: Feb. 6, 2003 



(54) METHOD FOR RAPID DETECTION AND 
IDENTIFICATION OF BIOAGENTS 

(76) Inventors: David J. Ecker, Encinitas, CA (US); 

Richard Griffey, Vista, CA (US); 
Rangarajan Sampath, San Diego, CA 
(US); Steven Hofstadler, Oceanside, 
CA (US); John McNeil, La Jolla, CA 
(US) 

Correspondence Address: 
Woodcock Washburn Kurtz 
Mackiewicz & Norris LLP 
One Liberty Place - 46th Floor 
Philadelphia, PA 19103 (US) 

(21) Appl. No.: 09/798,007 



(22) Filed: Mar. 2, 2001 

Publication Classification 

(51) Int. CI. 7 C12Q 1/68; C12P 19/34 

(52) U.S. CI 435/6; 435/91.2 

(57) ABSTRACT 

Method for detecting and identifying unknown bioagents, 
including bacteria, viruses and the like, by a combination of 
nucleic acid amplification and molecular weight determina- 
tion using primers which hybridize to conserved sequence 
regions of nucleic acids derived from a bioagent and which 
bracket variable sequence regions that uniquely identify the 
bioagent. The result is a "base composition signature" (IK'S) 
which is then matched against a database of base composi- 
tion signatures, by which the bioagent is identified. 



Patent Application Publication Feb. 6, 2003 Sheet 1 of 18 US 2003/0027135 Al 




Patent Application Publication Feb. 6, 2003 Sheet 2 of 18 US 2003/0027135 Al 




Patent Application Publication Feb. 6, 2003 Sheet 3 of 18 US 2003/0027135 Al 




Patent Application Publication Feb. 6, 2003 Sheet 4 of 18 US 2003/0027135 Al 




Patent Application Publication Feb. 6, 2003 Sheet 5 of 18 



US 2003/0027135 Al 




Patent Application Publication 



Feb. 6,2003 Sheet 6 of 18 US 2003/0027135 Al 




Patent Application Publication Feb. 6, 2003 Sheet 7 of 18 US 2003/0027135 Al 



FIGURE 1G 



FIGURE 1H 
1840 r 1903 



^202} 10-41 -^J \+ 



115-203 



1778-1841 

^193-256 168^1788^.^,-^ j 

23S rRNA Domain I S^J— f fef" 

23S rRNA Domain IV 



FIGURE II 



1100-1188 



16S rRNA Domain ill 



Patent Application Publication Feb. 6, 2003 Sheet 8 of 18 US 2003/0027135 Al 



FIGURE 2 




1100-1188 



1120 



1130 



ACGAGC 

G I U I it 
u . G UGCUCG 

1070 g 



i I • « I I t I I I I 

G-naote * 1150 



1080 -A G 
GU 




G« G* v 



G — C u • — • 

U — A 1170 - A G 
A • <<_ 



No-2] 



/ 
[3-35] 



16S rRNA Domain III 



Patent Application Publication Feb. 6, 2003 Sheet 9 of 18 US 2003/0027135 Al 



FIGURE 3 



3 Ao 




O a OO O O 04 OO ft 

If ) I J i • I I # I I 
o a ooao 005005, 



o <j a e o a o eo o o 
, t i . . i t i t 1 
aa©qoooooooflSt<»* 



u*cg«cc o- 



Patent Application Publication Feb. 6, 2003 Sheet 10 of 18 US 2003/0027135 Al 
FIGURE 4 



T* tag r ACG r ACGT 

AMASS (T* - T) = X AT* GC AT* GC A^ 



T* ACG T'ACG T* 



AT'GCAT'CGA 



TAC'GTACGT 



C* tag TACGTACGT-" 
AMASS (C* -C) = Y ATGC'ATGC'A- 



Patent Application Publication Feb. 6, 2003 Sheet 11 of 18 US 2003/0027135 Al 
FIGURE 5 



a anthracis (A 14 G 9 C 14 T 9 ) MW^ = 14072.2) 



B. anthracis* (A t A* 13 G 9 C 14 T 9 ) MW mea5 = 14280.9) 



13500 



14000 
MW 



14500 



Patent Application Publication Feb. 6, 2003 Sheet 12 of 18 US 2003/0027135 Al 



FIGURE 6 









IZljIlk- 

.■4-1775-90*1 


"1 

ii i 




m»r.16S_1337F ill 

it. 


os -J 

■ "3 

0-2 -j 






m*.* 1775 -° 17753 



Patent Application Publication 



Feb. 6, 2003 Sheet 13 of 18 US 2003/0027135 Al 




Patent Application Publication Feb. 6, 2003 Sheet 14 of 18 US 2003/0027135 Al 




I I I i § if 



Patent Application Publication Feb. 6, 2003 Sheet 15 of 18 US 2003/0027135 Al 




asuodssw joioejaq 



Patent Application Publication Feb. 6, 2003 Sheet 16 of 18 



US 2003/0027135 Al 




Patent Application Publication 



Feb. 6, 2003 Sheet 17 of 18 US 2003/0027135 Al 




Patent Application Publication Feb. 6, 2003 Sheet 18 of 18 US 2003/0027135 Al 



E 




US 2003/0027135 Al 



1 



Feb. 6, 2003 



METHOD FOR RAPID DETECTION AND 
IDENTIFICATION OF BIOAGENTS 

STATEMENT OF GOVERNMENT SUPPORT 

[0001] This invention was made with United States Gov- 
ernment support under DARPA/SPO contract BAA00-09. 
The United States Government may have certain rights in 
the invention. 

FIELD OF THE INVENTION 
[0002] The present invention relates to methods for rapid 
detection and identification of bioagents from environmen- 
tal, clinical or other samples. The methods provide for 
detection and characterization of a unique base composition 
signature (BCS) from any bioagent, including bacteria and 
viruses. The unique BCS is used to rapidly identify the 
bioagent. 

BACKGROUND OF THE INVENTION 
[0003] Rapid and definitive microbial identification is 
desirable for a variety of industrial, medical, environmental, 
quality, and research reasons. Traditionally, the microbiol- 
ogy laboratory has functioned to identify the etiologic agents 
of infectious diseases through direct examination and culture 
of specimens. Since the mid-1980s, researchers have repeat- 
edly demonstrated the practical utility of molecular biology 
techniques, main' of which form the basis of clinical diag- 
nostic assays. Some of these techniques include nucleic acid 
hybridization analysis, restriction enzyme analysis, genetic 
sequence analysis, and separation and purification of nucleic 
acids (See, e.g., J. Sambrook, E. F. Fritsch, and T. Maniatis, 
Molecular Cloning: A Laboratory Manual, 2nd Ed., Cold 
Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., 
1989). These procedures, in general, are time-consuming 
and tedious. Another option is the polymerase chain reaction 
(PCR) or other amplification procedure which amplifies a 
specific target DNA sequence based on the flanking primers 
used. Finally, detection and data analysis convert the hybrid- 
ization event into an analytical result. 

[0004] Other techniques for detection of bioagents include 
high-resolution mass spectrometry (MS), low-resolution 
MS, fluorescence, radioiodination, DNA chips and antibody 
techniques. None of these techniques is entirely satisfactory. 

[0005] Mass spectrometry provides detailed information 
about the molecules being analyzed, including high mass 
accuracy. It is also a process that can be easily automated. 
However, high-resolution MS alone fails to perform against 
unknown or bioengineered agents, or in environments where 
there is a high background level of bioagents (■■cluttered" 
background). Low-resolution MS can fail to detect some 
known agents, if their spectral lines are sufficiently weak or 
sufficiently close to those from other living organisms in the 
sample. DNA chips with specific probes can only determine 
the presence or absence of specifically anticipated organ- 
isms. Because there are hundreds of thousands of species of 
benign bacteria, some very similar in sequence to threat 
organisms, even arrays with 10,000 probes lack the breadth 
needed to detect a particular organism. 

[0006] Antibodies face more severe diversity limitations 
than arrays. If antibodies are designed against highly con- 
served targets to increase diversity, the false alarm problem 
will dominate, again because threat organisms are very 
similar to benign ones. Antibodies are only capable of 
detecting known agents in relatively uncluttered environ- 



[0007] Several groups have described detection of PCR 
products using high resolution electrospray ionization — 
Fourier transform — ion cyclotron resonance mass spectrom- 
etry (ESI-FT-ICR MS). Accurate measurement of exact 
mass combined with knowledge of the number of at least 
one nucleotide allowed calculation of the total base compo- 
sition for PCR duplex products of approximately 100 base 
pairs. (Aaserud et al., J. Am. Sac. Mass Spec. 7: 126b- 1 269, 
1996; Muddiman et al.,A«a/. Chem. 69:1543-1549, 1997; 
Wunschel et al, Anal. Clwm. 70:1:03-1207. 1998; Muddi- 
man et al., Rev. Anal. Chem. 17:1-68, 1998). Electrospray 
ionization-1 ourier transform-ion cyclotron resistance (ESI- 
FT-ICR) MS may be used to determine the mass of double- 
stranded, 500 base -pair PCR products via the average 
molecular mass (Hurst et al, Rapid Comnum. Mass Spec. 
10:377-382, 1996). The use of matrix-assisted laser desorp- 
tion ionization-time of flight (MALDI-TOF) mass spectrom- 
etry for characterization of PCR products has been 
described. (Muddiman et al., Rapid Commun. Mass Spec. 
13:1201-1204. 1999). However, the degradation of DNAs 
over about 75 nucleotides observed with MALDI limited the 
utility of this method. 

[0008] U.S. Pat. No. 5,849,492 describes a method for 
retrieval of phylogenetically informative DNA sequences 
which comprise searching for a highly divergent segment of 
genomic DNA surrounded by two highly conserved seg- 
ments, designing the universal primers for PCR amplifica- 
tion of the highly divergent region, amplifying the genomic 
DNA by PCR technique using universal primers, and then 
sequencing the gene to determine the identity of the organ- 

[0009] U.S. Pat. No. 5,965,363 discloses methods for 
screening nucleic acids for polymorphisms by analyzing 
amplified target nucleic acids using mass spectrometric 
techniques and to procedures for improving mass resolution 
and mass accuracy of these methods. 

[0010] WO 99/14375 describes methods, PCR primers 
and kits for use in analyzing preselected DNA tandem 
nucleotide repeat alleles by mass spectrometry. 

[0011] WO 98/12355 discloses methods of determining 
the mass of a target nucleic acid by mass spectrometric 
analysis, by cleaving the target nucleic acid to reduce its 
length, making the target single-stranded and using MS to 
determine the mass of the single-stranded shortened target. 
Also disclosed are methods of preparing a double-stranded 
target nucleic acid for MS analysis comprising amplification 
of the target nucleic acid, binding one of the strands to a 
solid support, releasing the second strand and then releasing 
the first strand which is then analyzed by MS. Kits for target 
nucleic acid preparation are also provided. 

[0012] PCT WO97/33000 discloses methods for detecting 
mutations in a target nucleic acid by nonrandomly fragment- 
ing the target into a set of single-stranded nonrandom length 
fragments and determining their masses by MS. 

[0013] U.S. Pat. No. 5,605,798 describes a fast and highly 
accurate mass spectrometer-based process for detecting the 
presence of a particular nucleic acid in a biological sample 
for diagnostic purposes. 

[0014] WO 98/21066 describes processes for determining 
the sequence of a particular target nucleic acid by mass 
spectrometry. Processes for detecting a target nucleic acid 
present in a biological sample by PCR amplification and 
mass spectrometry detection are disclosed, as are methods 
for detecting a target nucleic acid in a sample by amplifying 



US 2003/0027135 Al 



2 



Feb. 6, 2003 



the target with primers that contain restriction sites and tags, 
extending and cleaving the amplified nucleic acid, and 
detecting the presence of extended product, wherein the 
presence of a DNA fragment of a mass different from 
wild-type is indicative of a mutation. Methods of sequencing 
a nucleic acid via mass spectrometry methods are also 
described. 

[0015] WO 97/37041, WO 99/31278 and U.S. Pat. No. 
5,547,835 describe methods of sequencing nucleic acids 
using mass spectrometry. U.S. Pat. Nos. 5,622,824, 5,872, 
003 and 5,691,141 describe methods, systems and kits for 
exonuclease-mediated mass spectrometric sequencing. 
[0016] Thus, there is a need for a method for bioagent 
detection and identification which is both specific and rapid, 
and in which no nucleic acid sequencing is required. The 
present invention addresses this need. 

SUMMARY OF THE INVENTION 

[0017] One embodiment of the present invention is a 
method of identifying an unknown bioagent comprising (a) 
contacting nucleic acid from the bioagent with at least one 
pair of oligonucleotide primers which hybridize to 
sequences of the nucleic acid and flank a variable nucleic 
acid sequence; (b) amplifying the variable nucleic acid 
sequence to produce an amplification product; (c) determin- 
ing the molecular mass of the amplification product; and (d) 
comparing the molecular mass to one or more molecular 
masses of amplification products obtained by performing 
steps (a)-(c) on a plurality of known organisms, wherein a 
match identifies the unknown bioagent. In one aspect of this 
preferred embodiment, the sequences lo which the at least 
one pair of oligonucleotide primers hybridize are highly 
conserved. Preferably, the amplifying step comprises poly- 
merase chain reaction. Alternatively, the amplifying step 
comprises ligase chain reaction or strand displacement 
amplification. In one aspect of this preferred embodiment, 
the bioagent is a bacterium, virus, cell or spore. Advanta- 
geously, the nucleic acid is ribosomal RNA. In another 
aspect, the nucleic acid encodes RNase P or an RNA- 
dependent RNA polymerase. Preferably, the amplification 
product is ionized prior to molecular mass determination. 
The method may further comprise the step of isolating 
nucleic acid from the bioagent prior to contacting the nucleic 
acid with the at least one pair of oligonucleotide primers. 
The method may further comprise the step of performing 
steps (a)-(d) using a different oligonucleotide primer pair 
and comparing the results to one or more molecular mass 
amplification products obtained by performing steps (a)-(c) 
on a different plurality of known organisms from those in 
step (d). Preferably, the one or more molecular mass is 
contained in a database of molecular masses. In another 
aspect of this preferred embodiment, the amplification prod- 
uct is ionized by electrospray ionization, matrix-assisted 
laser desorption or fast atom bombardment. Advanta- 
geously, the molecular mass is determined by mass spec- 
trometry. Preferably, the mass spectrometry is Fourier trans- 
form ion cyclotron resonance mass spectrometry (FT-ICR- 
MS), ion trap, quadrupole, magnetic sector, time of flight 
(TOF), Q-TOF or triple quadrupole. The method may further 
comprise performing step (b) in the presence of an analog of 
adenine, thymidine, guanosine or cytidine having a different 
molecular weight than adenosine, thymidine, guanosine or 
cytidine. In one aspect, the oligonucleotide primer com- 
prises a base analog or substitute base at positions 1 and 2 
of each triplet within the primer, wherein the base analog or 
substitute base binds with increased affinity to its comple- 



ment compared to the native base. Preferably, the primer 
comprises a universal base at position 3 of each triplet within 
the primer. The base analog or substitute base may be 
2,6-diaminopurine, propyne T, propyne G, phenoxazines or 
G-clamp. Preferably, the universal base is inosine, guani- 
dine, uridine, 5-nitroindole, 3-nitropyrrole, dP or dK, or 
l-(2-deoxy-P-D-ribofuranosyl)-imidazole-4-carboxamide. 
[0018] Another embodiment of the present invention is a 
method of identifying an unknown bioagent comprising (a) 
contacting nucleic acid from the bioagent with at least one 
pair of oligonucleotide primers which hybridize to 
sequences of the nucleic acid and flank a variable nucleic 
acid sequence; (b) amplifying the variable nucleic acid 
sequence to produce an amplification product; (c) determin- 
ing the base composition of the amplification product; and 
(d) comparing the base composition to one or more base 
compositions of amplification products obtained by per- 
forming steps (a)-(c) on a plurality of known organisms, 
wherein a match identifies the unknown bioagent. In one 
aspect of this preferred embodiment, the sequences to which 
the at least one pair of oligonucleotide primers hybridize are 
highly conserved. Preferably, the amplifying step comprises 
polymerase chain reaction. Alternatively, the amplifying 
step comprises ligase chain reaction or strand displacement 
amplification. In one aspect of this preferred embodiment, 
the bioagent is a bacterium, virus, cell or spore. Advanta- 
geously, the nucleic acid is ribosomal RNA. In another 
aspect, the nucleic acid encodes RNase P or an RNA- 
dependent RNA polymerase. Preferably, the amplification 
product is ionized prior to molecular mass determination. 
The method may further comprise the step of isolating 
nucleic acid from the bioagent prior to contacting the nucleic 
acid with the at least one pair of oligonucleotide primers. 
The method may further comprise the step of performing 
steps (a)-(d) using a different oligonucleotide primer pair 
and comparing the results to one or more base composition 
signatures of amplification products obtained by performing 
steps (a)-(c) on a different plurality of known organisms 
from those in step (d). Preferably, the one or more base 
compositions is contained in a database of base composi- 
tions. In another aspect of this preferred embodiment, the 
amplification product is ionized by electrospray ionization, 
matrix-assisted laser desorption or fast atom bombardment. 
Advantageously, the molecular mass is determined by mass 
spectrometry. Preferably, the mass spectrometry is Fourier 
transform ion cyclotron resonance mass spectrometry (FT- 
ICR-MS), ion trap, quadrupole, magnetic sector, time of 
flight (TOF), Q-TOF or triple quadrupole. The method may 
further comprise performing step (b) in the presence of an 
analog of adenine, thymidine, guanosine or cytidine having 
a different molecular weight than adenosine, thymidine, 
guanosine or cytidine. In one aspect, the oligonucleotide 
primer comprises a base analog or substitute base at posi- 
tions 1 and 2 of each triplet within the primer, wherein the 
base analog or substitute base binds with increased affinity 
to its complement compared to the native base. Preferably, 
the primer comprises a universal base at position 3 of each 
triplet within the primer. The base analog or substitute base 
may be 2,6-diaminopurine, propyne T, propyne G, phenox- 
azines or G-clamp. Preferably, the universal base is inosine, 
guanidine, uridine, 5-nitroindole, 3-nitropyrrole, dP or dK, 
or l-(2-deoxy-|3-D-ribofuranosyl)-imidazole-4-carboxam- 

[0019] The present invention also provides a method for 
detecting a single nucleotide polymorphism in an individual, 
comprising the steps of (a) isolating nucleic acid from the 
individual; (b) contacting the nucleic acid with oligonucle- 



US 2003/0027135 Al 



3 



Feb. 6, 2003 



otide primers which hybridize to regions of the nucleic acid 
which flank a region comprising the potential polymor- 
phism; (c) amplifying the region to produce an ampliiicalion 
product; (d) determining the molecular mass of the ampli- 
fication product; and (e) comparing the molecular mass to 
the molecular mass of the region in an individual known to 
have the polymorphism, wherein if the molecular masses are 
the same then the individual has the polymorphism. 
[0020] In one aspect of this preferred embodiment, the 
primers hybridize to highly conserved sequences. Prefer- 
ably, the polymorphism is associated with a disease. Alter- 
natively, the polymorphism is a blood group antigen. In one 
aspect of the preferred embodiment, the amplifying step is 
polymerase chain reaction. Alternatively, the amplification 
step is ligase chain reaction or strand displacement ampli- 
fication. Preferably, the amplification product is ionized 
prior to mass determination. In one aspect, the amplification 
product is ionized by electrospray ionization, matrix-as- 
sisted laser desorption or fast atom bombardment. Advan- 
tageously, the molecular mass is determined by mass spec- 
trometry. Preferably, the mass spectrometry is Fourier 
transform ion cyclotron resonance mass spectrometry (FT- 
K'R-MS), ion trap, quadrupole, magnetic sector, time of 
flight (TOF), Q-TOF or triple quadrupole. 

BRIEF DESCRIPTION OF THE DRAWINGS 

[0021] FIGS. 1A-1I are consensus diagrams that show 
examples of conserved regions from 16S rRNA (FIGS. 
1A-1B), 23S rRNA (3'-half, FIGS. 1C-1D; 5'-half, FIGS. 
1E-F), 23S rRNA Domain I (FIG. 1G), 23S rRNA Domain 
IV (FIG. 1H) and 16S rRNA Domain III (FIG. 11) which 
are suitable for use in the present invention. Lines with 
arrows are examples of regions to which intelligent primer 
pairs for PCR are designed. The label for each primer pair 
represents the starting and ending base number of the 
amplified region on the consensus diagram. Bases in capital 
letters are greater than 95% conserved; bases in lower case 
letters are 90-95% conserved, filled circles are 80-90% 
conserved; and open circles are less than 80% conserved. 
The label for each primer pair represents the starting and 
ending base number of the amplified region on the consen- 
sus diagram. 

[0022] FIG. 2 shows a typical primer amplified region 
from the 16S rRNA Domain III shown in FIG. 1C. 
[0023] FIG. 3 is a schematic diagram showing conserved 
regions in RNase P. Bases in capital letters are greater than 
90% conserved; bases in lower case letters are 80-90% 
conserved; filled circles designate bases which are 70-80' i 
conserved; and open circles designate bases that are less 
than 70% conserved. 

[0024] FIG. 4 is a schematic diagram of base composition 
signature determination using nucleotide analog "tags" to 
determine base composition signatures. 
[0025] FIG. 5 shows the deconvoluted mass spectra of a 
Bacillus anthracis region with and without the mass tag 
phosphorothioate A (A*). The two spectra differ in that the 
measured molecular weight of the mass tag-containing 
sequence is greater than the unmodified sequence. 

[0026] FIG. 6 shows base composition signature (BCS) 
spectra from PCR products from Staphylococcus aureus (S. 
aureus 16S_1337F) and Bacillus anthracus (B. anthr. 16S_ 



1337F), amplified using the same primers. The two strands 
differ by only two (AT^CG) substitutions and are clearly 
distinguished on the basis of their BCS. 

[0027] FIG. 7 shows that a single difference between two 
sequences (A 14 in B. anthracis vs. A 15 in B. cereus) can be 
easily detected using ESI-TOF mass spectrometry. 

[0028] FIG. 8 is an ESI-TOF of Bacillus anthracis spore 
coat protein sspE 56mer plus calibrant. The signals unam- 
biguously identify B. anthracis versus other Bacillus spe- 

[0029] FIG. 9 is an ESI-TOF of a B. anthracis synthetic 
16S_1228 duplex (reverse and forward strands). The tech- 
nique easily distinguishes between the forward and reverse 
strands. 

[0030] FIG. 10 is an ESI-FTICR-MS of a synthetic B. 
anthracis 16S_1337 46 base pair duplex. 

[0031] FIG. 11 is an ESI-TOF-MS of a 56mer oligonucle- 
otide (3 scans) from the B. anthracis saspB gene with an 
internal mass standard. The internal mass standards are 
designated by asterisks. 

[0032] FIG. 12 is an ESI-TOF-MS of an internal standard 
with 5 mM TBA-TFA buffer showing that charge stripping 
with tributylammonium trifluoroacetate reduces the most 
abundant charge state from [M-8H + ] 8 " to [M-3H + f~. 

DETAILED DESCRIPTION OF THE 
INVENTION 

[0033] The present invention provides a combination of a 
non-PCR biomass detection mode, preferably high-resolu- 
tion MS, with PCR-based BCS technology using "intelligent 
primers" which hybridize to conserved sequence regions of 
nucleic acids derived from a bioagent and which bracket 
variable sequence regions that uniquely identify the bioag- 
ent. The high-resolution MS technique is used to determine 
the molecular mass and base composition signature (BCS) 
of the amplified sequence region. This unique "base com- 
position signature" (IK'S) is then input to a maximum- 
likelihood detection algorithm for matching against a data- 
base of base composition signatures in the same amplified 
region. The present method combines PCR-based amplifi- 
cation technology (which provides specificity) and a 
molecular mass detection mode (which provides speed and 
does not require nucleic acid sequencing of the amplified 
target sequence) for bioagent detection and identification. 

[0034] The present method allows extremely rapid and 
accurate detection and identification of bioagents compared 
to existing methods. Furthermore, this rapid detection and 
identification is possible even when sample material is 
impure. Thus, the method is useful in a wide variety of 
fields, including, but not limited to, environmental testing 
(e.g., detection and discrimination of pathogenic vs. non- 
pathogenic bacteria in water or other samples), germ warfare 
(allowing immediate identification of the bioagent and 
appropriate treatment), pharmacogenetic analysis and medi- 
cal diagnosis (including cancel' diagnosis based on muta- 
tions and polymorphisms; drug resistance and susceptibility 
testing; screening for and/or diagnosis of genetic diseases 
and conditions; and diagnosis of infectious diseases and 
conditions). The method leverages ongoing biomedical 
research in virulence, pathogenicity, drug resistance and 
genome sequencing into a method which provides greatly 
improved sensitivity, specificity anil reliability compared to 
existing methods, with lower rates of false positives. 



US 2003/0027135 Al 



4 



Feb. 6, 2003 



[0035] The present method can be used to detect and 
classify any biological agent, including bacteria, viruses, 
fungi and toxins. As one example, where the agent is a 
biological threat, the information obtained is used to deter- 
mine practical information needed for countermeasures, 
including toxin genes, pathogenicity islands and antibiotic 
resistance genes. In addition, the methods can be used to 
identify natural or deliberate engineering events including 
chromosome fragment swapping, molecular breeding (gene 
shuffling) and emerging infectious diseases. 
[0036] Bacteria have a common set of absolutely required 
genes. About 250 genes are present in all bacterial species 
(Proc. Natl. Acad. Sci. USA. 93:10268, 1996; Science 
270:397, 1995), including tiny genomes like Mycoplasma, 
Ureaplasma and Rickettsia. These genes encode proteins 
involved in translation, replication, recombination and 
repair, transcription, nucleotide metabolism, amino acid 
metabolism, lipid metabolism, energy generation, uptake, 
secretion and the like. Examples of these proteins are DNA 
polymerase III beta, elongation factor TU, heat shock pro- 
tein groEL, RNA polymerase beta, phosphoglycerate kinase, 
NADU dehydrogenase, DNA ligase, DNA topoisomerase 
and elongation factor G. Operons can also be targeted using 
the present method. One example of an operon is the bfp 
operon from enteropathogenic E. coli. Multiple core chro- 
mosomal genes can he used to classify bacteria at a genus or 
genus species level to determine if an organism has threat 
potential. The method can also be used to detect pathoge- 
nicity markers (plasmid or chromosomal) and antibiotic 
resistance genes to confirm the threat potential of an organ- 
ism and to direct countermeasures. 

[0037] A theoretically ideal bioagent detector would iden- 
tify, quantify, and report the complete nucleic acid sequence 
of every bioagent that reached the sensor. The complete 
sequence of the nucleic acid component of a pathogen would 
provide all relevant information about the threat, including 
its identity and the presence of drug-resistance or pathoge- 
nicity markers. This ideal has not yet been achieved. How- 
ever, the present invention provides a straightforward strat- 
egy for obtaining information with the same practical value 
using base composition signatures (BCS). While the base 
composition of a gene fragment is not as information-rich as 
the sequence itself, there is no need to analyze the complete 
sequence of the gene if the short analyte sequence fragment 
is properly chosen. A database of reference sequences can be 
prepared in which each sequence is indexed to a unique base 
composition signature, so that the presence of the sequence 
can be inferred with accuracy from the presence of the 
signature. The advantage of base composition signatures is 
that they can be quantitatively measured in a massively 
parallel fashion using multiplex PCR (PCR in which two or 
more primer pairs amplify target sequences simultaneously) 
and mass spectrometry. These multiple primer amplified 
regions uniquely identify most threat and ubiquitous back- 
ground bacteria and viruses. In addition, cluster-specific 
primer pairs distinguish important local clusters (e.g., 
anthracis group). 

[0038] In the context of this invention, a "bioagent" is any 
organism, living or dead, or a nucleic acid derived from such 
an organism. Examples of bioagents include but are not 
limited to cells (including but not limited to human clinical 
samples, bacterial cells and other pathogens) viruses, toxin 
genes and bioregulating compounds). Samples may be alive 



or dead or in a vegetative state (for example, vegetative 
bacteria or spores) and may be encapsulated or bioengi- 
neered. 

[0039] As used herein, a Sbase composition signatures 
S(BCS) is the exact base composition from selected frag- 
ments of nucleic acid sequences that uniquely identifies the 
target gene and source organism. BCS can be thought of as 
unique indexes of specific genes. 

[0040] As used herein. Sintelligent primers Sare primers 
which bind to sequence regions which flank an intervening 
variable region. In a preferred embodiment, these sequence 
regions which flank the variable region are highly conserved 
among different species of bioagent. For example, the 
sequence regions may be highly conserved among all Bacil- 
lus species. By the term ^highly conserve it is meant that 
the sequence regions exhibit between about 80-100' ? . more 
preferably between about 90-100% and most preferably 
between about 95-100% identity. Examples of intelligent 
primers which amplify regions of the 16S and 23S rRNAare 
shown in FIGS. 1A-1I. Atypical primer amplified region in 
16S rRNAis shown in FIG. 2. The arrows represent primers 
which bind to highly conserved regions which flank a 
variable region in 16S rRNA domain III. The amplified 
region is the stem-loop structure under $1100-1188$ 

[0041] One main advantage of the detection methods of 
the present invention is that the primers need not be specific 
for a particular bacterial species, or even genus, such as 
Bacillus or Streptomyces. Instead, the primers recognize 
highly conserved regions across hundreds of bacterial spe- 
cies including, but not limited to, the species described 
herein. Thus, the same primer pair can be used to identify 
any desired bacterium because it will bind to the conserved 
regions which flank a variable region specific to a single 
species, or common to several bacterial species, allowing 
nucleic acid amplification of the intervening sequence and 
determination of its molecular weight and base composition. 
For example, the 16S_971-1062, 16S_1228-1310 and 
16S_1100-1188 regions are 98-99% conserved in about 900 
species of bacteria (16S=16S rRNA, numbers indicate 
nucleotide position). In one embodiment of the present 
invention, primers used in the present method bind to one or 
more of these regions or portions thereof. 

[0042] The present invention provides a combination of a 
non-PCR biomass detection mode, preferably high-resolu- 
tion MS, with nucleic acid amplification-based BCS tech- 
nology using "intelligent primers" which hybridize to con- 
served regions and which bracket variable regions that 
uniquely identify the bioagent(s). Although the use of PCR 
is preferred, other nucleic acid amplification techniques may 
also be used, including ligase chain reaction (LCR) and 
strand displacement amplification (SDA). The high-resolu- 
tion MS technique allows separation of bioagent spectral 
lines from background spectral lines in highly cluttered 
environments. The resolved spectral lines are then translated 
to BCS which are input to a maximum-likelihood detection 
algorithm matched against spectra for one or more known 
BCS. Preferably, the bioagent BCS spectrum is matched 
against one or more databases of BCS from vast numbers of 
bioagents. Preferably, the matching is done using a maxi- 
mum-likelihood detection algorithm. 



US 2003/0027135 Al 



5 



Feb. 6, 2003 



[0043] In a preferred embodiment, base composition sig- 
natures are quantitatively measured in a massively parallel 
fashion using the polymerase chain reaction (PCR), prefer- 
ably multiplex PCR, and mass spectrometric (MS) methods. 
Sufficient quantities of nucleic acids must be present for 
detection of bioagents by MS. A wide variety of techniques 
for preparing large amounts of purified nucleic acids or 
fragments thereof are well known to those of skill in the art. 
PCR requires one or more pairs of oligonucleotide primers 
which bind to regions which flank the target sequence(s) to 
be amplified. These primers prime synthesis of a different 
strand of DNA, with synthesis occurring in the direction of 
one primer towards the other primer. The primers, DNA to 
be amplified, a thermostable DNA polymerase (e.g. Tag 
polymerase), the four deoxynucleotide triphosphates, and a 
buffer are combined to initiate DNA synthesis. The solution 
is denatured by heating, then cooled to allow annealing of 
newly added primer, followed by another round of DNA 
synthesis. This process is typically repeated for about 30 
cycles, resulting in amplification of the target sequence. 

[0044] The "intelligent primers" define the target sequence 
region to be amplified and analyzed. In one embodiment, the 
target sequence is a ribosomal RNA(rRNA) gene sequence. 
With the complete sequences of many of the smallest 
microbial genomes now available, it is possible to identify 
a set of genes that defines "minimal life" and identify 
composition signatures thai uniquely identify each gene and 
organism. Genes that encode core life functions such as 
DNA replication, transcription, ribosome structure, transla- 
tion, and transport are distributed broadly in the bacterial 
genome and are preferred regions for BCS analysis. Ribo- 
somal RNA (rRNA) genes comprise regions that provide 
useful base composition signatures. Like many genes 
involved in core life functions, rRNA genes contain 
sequences that are extraordinarily conserved across bacterial 
domains interspersed with regions of high variability that are 
more specific to each species. The variable regions can be 
utilized to build a database of base composition signatures. 
The strategy involves creating a structure-based alignment 
of sequences of the small (16S) and the large (23S) subunits 
of the rRNA genes. For example, there are currently over 
13,000 sequences in the ribosomal RNA database that has 
been created and maintained by Robin Gutell, University of 
Texas at Austin, and is publicly available on the Institute for 
Cellular and Molecular Biology web page (http://www.rna- 
.icnib.utexas.edu ). There is also a publicly available rRNA 
database created and maintained by the University of Antw- 
erp, Belgium at http://www.rrna.uia.ac.be. 

[0045] These databases have been analyzed to determine 
regions that are useful as base composition signatures. The 
characteristics of such regions are: a) between about 80 and 
100'r. preferably >aboul 95' r identity among species of the 
particular bioagent of interest, of upstream and downstream 
nucleotide sequences which serve as sequence amplification 
primer sites; b) an intervening variable region which exhib- 
its no greater than about 5% identity among species; and c) 
a separation of between about 30 and 1000 nucleotides, 
preferably no more than about 50-250 nucleotides, and more 
preferably no more than about 60-100 nucleotides, between 
the conserved regions. 

[0046] Due to their overall conservation, the flanking 
rRNA primer sequences serve as good "universal" primer 
binding sites to amplify the region of interest for most, if not 



all, bacterial species. The intervening region between the 
sets of primers varies in length and/or composition, and thus 
provides a unique base composition signature. 
[0047] It is advantageous to design the "intelligent prim- 
ers" to be as universal as possible to minimize the number 
of primers which need to be synthesized, and to allow 
detection of multiple species using a single pair of primers. 
These primer pairs can be used to amplify v ariable regions 
in these species. Because any variation (due to codon 
wobble in the 3 rd position) in these conserved regions 
among species is likely to occur in the third position of a 
DNA triplet, oligonucleotide primers can be designed such 
that the nucleotide corresponding to this position is a base 
which can bind to more than one nucleotide, referred to 
herein as a "universal base". For example, under this 
"wobble" pairing, inosine (I) binds to U, C or A; guanine (Ci) 
binds to U or C, and uridine (U) binds to U or C. Other 
examples of universal bases include nitroindoles such as 
5-nitroindole or 3-nitropyrrole (Loakes et al., Nucleosides 
and Nucleotides 14:1001-1003, 1995), the degenerate nucle- 
otides dP or dK (Hill et al.), an acyclic nucleoside analog 
containing 5-nitroindazole (Van Aerschot et al., Nucleosides 
and Nucleotides 14:1053-1056. 1995) or the purine analog 
l-(2-deoxy-|?-D-ribofuranosyl)-imidazole-4-carboxamide 
(Sala et al., Nucl. Acids Res. 24:3302-3306, 1996). 
[0048] In another embodiment of the invention, to com- 
pensate for the somewhat weaker binding by the "wobble" 
base, the oligonucleotide primers are designed such that the 
first and second positions of each triplet are occupied by 
nucleotide analogs which bind with greater affinity than the 
unmodified nucleotide. Examples of these analogs are 2,6- 
diaminopurine which binds to thymine, propync T which 
binds to adenine and propync ( ' and phenoxa/ines, including 
G-clamp, which binds to G. Propynes are described in U.S. 
Pat. Nos. 5,645,985, 5,830.653 and 5,484,908, the entire 
contents of which are incorporated herein by reference. 
Phenoxazines are described in U.S. Pat. Nos. 5,502,177, 
5,763,588, and 6,005,096, the entire contents of which are 
incorporated herein by reference. G-clamps are described in 
U.S. Pat. Nos. 6,007,992 and 6,028,183, the entire contents 
of which are incorporated herein by reference. 
[0049] Bacterial biological warfare agents capable of 
being detected by the present methods include Bacillus 
amhracis (anthrax), Yersinia pestis (pneumonic plague), 
i'ranciscella lularensis (tularemia), Brucella suis. Brucella 
abortus, Brucella melilensis (undulant fever), Burkliolderia 
mallei (glanders), Burkliolderia pseudomalleii (melioido- 
sis), Salmonella typhi (typhoid fewer), Rickettsia typkii (epi- 
demic typhus), Rickettsia pmwasekii (endemic typhus) and 
Coxiella burnetii (Q fever), Rhodobacter capsulatus, 
Chlamydia pneumoniae, Escherichia coli, Shigella dysente- 
riae, Shigella flexneri, Bacillus cereus, Clostridium botuli- 
num, Coxiella burnetii, Pseudomonas aeruginosa, 
Legionella pneumophila, and Vibrio cholerae. 
[0050] Besides 16S and 23S rRNA, other target regions 
suitable for use in the present invention for detection of 
bacteria include 5S rRNA and RNase P (FIG. 3). 
[0051] Biological warfare fungus biowarfare agents 
include coccidioides immitis (Coccidioidomycosis). 
[0052] Biological warfare toxin genes capable of being 
detected by the methods of the present invention include 
botulism, T-2 mycotoxins, ricin, staph enterotoxin B, shiga- 
toxin, abrin, aflatoxin, Clostridium perfringens epsilon 
toxin, conotoxins, diacetoxyscirpenol, tetrodotoxin and sax- 



US 2003/0027135 Al 



6 



Feb. 6, 2003 



[0053] Biological warfare viral threat agents are mostly 
RNA viruses (positive-strand and negative-strand), with the 
exception of smallpox. Every RNA virus is a family of 
related viruses (quasispecies). These viruses mutate rapidly 
and the potential for engineered strains (natural or deliber- 
ate) is very high. RNA viruses cluster into families that have 
conserved RNA structural domains on the viral genome 
(e.g., virion components, accessory proteins) and conserved 
housekeeping genes that encode core viral proteins includ- 
ing, for single strand positive strand RNA viruses, RNA- 
dependent RNA polymerase, double stranded RNAhelicase, 
chymotrypsin-like and papain-like proteases and methyl- 
transferases. 

[0054] Examples of (-)-strand RNA viruses include 
arenaviruses (e.g., sabia virus, lassa fever, Machupo, Argen- 
tine hemorrhagic fever, flexal virus), bunyaviruses (e.g., 
hantavirus, nairovirus, phlebovirus, hantaan virus, Congo- 
crimean hemorrhagic fever, rift valley fever), and monon- 
egavirales (e.g., filovirus, paramyxovirus, ebola virus, Mar- 
burg, equine morbillivirus). 

[0055] Examples of (+)-strand RNA viruses include picor- 
naviruses (e.g., coxsackievirus, echovirus, human coxsack- 
ievirus A, human echovirus, human enterovirus, human 
poliovirus, hepatitis A virus, human parechovirus, human 
rhinovirus), astroviruses (e.g., human astrovirus), calcivi- 
ruses (e.g., chiba virus, chitta virus, human calcivirus, 
norwalk virus), nidovirales (e.g., human coronavirus, human 
torovirus), flaviviruses (e.g., dengue virus 1-4, Japanese 
encephalitis virus, Kyanasur forest disease virus, Murray 
Valley encephalitis virus. Rocio virus. St. I .on is encephalitis 
virus, West Nile virus, yellow fever virus, hepatitis c virus) 
and togaviruses (e.g., Chikugunya virus, Eastern equine 
encephalitis virus, Mayaro virus, O'nyong-nyong virus, 
Ross River virus, Venezuelan equine encephalitis virus, 
Rubella virus, hepatitis E virus). The hepatitis C virus has a 
5'-untranslated region of 340 nucleotides, an open reading 
frame encoding 9 proteins having 3010 amino acids and a 
3'-untranslated region of 240 nucleotides. The 5'-UTR and 
3'-UTR are 99% conserved in hepatitis C viruses. 

[0056] In one embodiment, the target gene is an RNA- 
dependent RNA polymerase or a helicase encoded by (+)- 
strand RNA viruses, or RNA polymerase from a (-)-strand 
RNA virus. (+)-strand RNA viruses are double stranded 
RNA and replicate by RNA-directed RNA synthesis using 
RNA-dependent RNA polymerase and the positive strand as 
a template. Helicase unwinds the RNA duplex to allow 
replication of the single stranded RNA. These viruses 
include viruses from the family picornaviridae (e.g., polio- 
virus, coxsackievirus, echovirus). togaviridae (e.g., alphavi- 
rus, flavivirus, rubivirus), arenaviridae (e.g., lymphocytic 
choriomeningitis virus, lassa fever virus), cononaviridae 
(e.g., human respiratory virus) and Hepatitis A virus. The 
genes encoding these proteins comprise variable and highly 
conserved regions which flank the variable regions. 

[0057] In a preferred embodiment, the detection scheme 
for the PCR products generated from the bioagent(s) incor- 
porates three features. First, the technique simultaneously 
detects and differentiates multiple (generally about 6-10) 
PCR products. Second, the technique provides a BCS that 
uniquely identifies the bioagent from the possible primer 
sites. Finally, the detection technique is rapid, allowing 
multiple PCR reactions to be run in parallel. 



[0058] In one embodiment, the method can be used to 
detect the presence of antibiotic resistance and/or toxin 
genes in a bacterial species. For example, liticillus iintlmicis 
comprising a tetracycline resistance plasmid and plasmids 
encoding one or both aiuhnwis toxins (pxOl and or px02) 
can be detected by using antibiotic resistance primer sets and 
toxin gene primer sets. If the IS. imthracis is positive for 
tetracycline resistance, then a different antibiotic, for 
example quinalone, is used. 

[0059] Mass spectrometry (MS)-based detection of PCR 
products provides all of these features with additional 
advantages. MS is intrinsically a parallel detection scheme 
without the need for radioactive or fluorescent labels, since 
every amplification product with a unique base composition 
is identified by its molecular mass. The current state of the 
art in mass spectrometry is such that less than femtomole 
quantities of material can be readily analyzed to afford 
information about the molecular contents of the sample. An 
accurate assessment of the molecular mass of the material 
can be quickly obtained, irrespective of whether the molecu- 
lar weight of the sample is several hundred, or in excess of 
one hundred thousand atomic mass units (amu) or Daltons. 
Intact molecular ions can be generated from amplification 
products using one of a variety of ionization techniques to 
convert the sample to gas phase. These ionization methods 
include, but are not limited to, electrospray ionization (ES), 
matrix-assisted laser desorption ionization (MAI. 1)1) and 
fast atom bombardment (FAB). For example, MAI .1)1 of 
nucleic acids, along with examples of matrices for use in 
MALDI of nucleic acids, are described in WO 98/54751 
(Genetrace, Inc.). 

[0060] Upon ionization, several peaks are observed from 
one sample due to the formation of ions with different 
charges. Averaging the multiple readings of molecular mass 
obtained from a single mass spectrum affords an estimate of 
molecular mass of the bioagent. Electrospray ionization 
mass spectrometry (ESI-MS) is particularly useful for very 
high molecular weight polymers such as proteins and 
nucleic acids having molecular weights greater than 10 kl)a, 
since it yields a distribution of multiply-charged molecules 
of the sample without causing a significant amount of 
fragmentation. The mass detectors used in the methods of 
the present invention include, but are not limited to, Fourier 
transform ion cyclotron resonance mass spectrometry (FT- 
ICR-MS), ion trap, quadrupole, magnetic sector, time of 
flight (TOF), Q-TOF, and triple quadrupole. 

[0061] In general, the mass spectrometric techniques 
which can be used in the present invention include, but are 
not limited to, tandem mass spectrometry, infrared multipho- 
ton dissociation and pyrolytic gas chromatography mass 
spectrometry (PGC-MS). In one embodiment of the inven- 
tion, the bioagent detection system operates continually in 
bioagent detection mode using pyrolytic GC-MS without 
PCR for rapid detection of increases in biomass (for 
example, increases in fecal contamination of drinking water 
or of germ warfare agents). To achieve minimal latency, a 
continuous sample stream flows directly into the PGC-MS 
combustion chamber. When an increase in biomass is 
detected, a PCR process is automatically initiated. Bioagent 
presence produces elevated levels of large molecular frag- 
ments from 100-7,000 Da which are observed in the PGC- 
MS spectrum. The observed mass spectrum is compared to 
a threshold level and when levels of biomass are determined 



US 2003/0027135 Al 



7 



Feb. 6, 2003 



to exceed a predetermined threshold, the bioagent classifi- 
cation process described hereinabove(combining PCR and 
MS, preferably FT-ICR MS) is initiated. Optionally, alarms 
or other processes (hailing ventilation flow, physical isola- 
tion) are also initiated by this detected biomass level. 

[0062] The accurate measurement of molecular mass for 
large DNAs is limited by the adduction of cations from the 
PCR reaction to each strand, resolution of the isotopic peaks 
from natural abundance 13 C and 15 N isotopes, and assign- 
ment of the charge state for any ion. The cations are removed 
by in-line dialysis using a flow-through chip that brings the 
solution containing the PCR products into contact with a 
solution containing ammonium acetate in the presence of an 
electric field gradient orthogonal to the flow. The latter two 
problems are addressed by operating with a resolving power 
of >100,000 and by incorporating isotopically depleted 
nucleotide triphosphates into the DNA. The resolving power 
of the instrument is also a consideration. At a resolving 
power of 10,000, the modeled signal from the [M-14H + ] 14 " 
charge state of an 84mer PCR product is poorly character- 
ized and assignment of the charge state or exact mass is 
impossible. At a resolving power of 33,000, the peaks from 
the individual isotopic components are visible. At a resolv- 
ing power of 100,000, the isotopic peaks are resolved to the 
baseline and assignment of the charge state for the ion is 
straightforward. The [ 13 C, 15 N]-depleted triphosphates are 
obtained, for example, by growing microorganisms on 
depleted media and harvesting the nucleotides (Batey et al., 
Nucl. Acids Res. 20:4515-4523, 1992). 

[0063] While mass measurements of intact nucleic acid 
regions are believed to be adequate to determine most 
bioagents, tandem mass spectrometry (MS n ) techniques may 
provide more definitive information pertaining to molecular 
identity or sequence. Tandem MS involves the coupled use 
of two or more stages of mass analysis where both the 
separation and detection steps are based on mass spectrom- 
etry. The first stage is used to select an ion or component of 



two series of fragment ions, called the w-series (having an 
intact 3' terminus and a 5' phosphate following internal 
cleavage) and the a-Base series(having an intact 5' terminus 
and a 3' furan). 

[0064] The second stage of mass analysis is then used to 
detect and measure the mass of these resulting fragments of 
product ions. Such ion selection followed by fragmentation 
routines can be performed multiple times so as to essentially 
completely dissect the molecular sequence of a sample. 

[0065] If there are two or more targets of similar base 
composition or mass, or if a single amplification reaction 
results in a product which has the same mass as two or more 
bioagent reference standards, they can be distinguished by 
using mass-modifying "tags." In this embodiment of the 
invention, a nucleotide analog or "tag" is incorporated 
during amplification (e.g., a 5-(trifluoromethyl) deoxythy- 
midine triphosphate) which has a different molecular weight 
than the unmodified base so as to improve distinction of 
masses. Such tags are described in, for example, PCT 
WO97/33000. This further limits the number of possible 
base compositions consistent with any mass. For example, 
5 -(trifluoromethyl)deoxy thymidine triphosphate can be used 
in place of d TIT in a separate nucleic acid amplification 
reaction. Measurement of the mass shift between a conven- 
tional amplification product and the lagged product is used 
to quantitate the number of thymidine nucleotides in each of 
the single strands. Because the strands are complementary, 
the number of adenosine nucleotides in each strand is also 
determined. 

[0066] In another amplification reaction, the number of G 
and C residues in each strand is determined using, for 
example, the cytidine analog 5-methylcytosine (5-meC) or 
propyne C. The combination of the A/T reaction and G/C 
reaction, followed by molecular weight determination, pro- 
vides a unique base composition. This method is summa- 
rized in FIG. 4 and Table 1. 



Total Base Base 
Double Single Amass info info 

strand strain 
Mass tag sequence sequel 



AT * GC AT * GCA 



ATGC * ATGC * A 



a sample from which further structural information is to be 
obtained. The selected ion is then fragmented using, e.g., 
blackbody irradiation, infrared multiphoton dissociation, or 
collisional activation. For example, ions generated by elec- 
trospray ionization (ESI) can be fragmented using IR mul- 
tiphoton dissociation. This activation leads to dissociation of 
glycosidic bonds and the phosphate backbone, producing 



[0067] The m ass tag phosphorothioate A (A*) was used to 
distinguish a Bacillus anihracis cluster. The B. anthracis 
(A 14 G 9 C 14 T 9 ) had an average MW of 14072.26, and the B. 
anthracis (AjA^GgC^Tg) had an average molecular 
weight of 14281.11 and the phosphorothioate A had an 
average molecular weight of +16.06 as determined by ESI- 
TOF MS. The deconvoluted spectra are shown in FIG. 5. 



US 2003/0027135 Al 



8 



Feb. 6, 2003 



[0068] In another example, assume the measured molecu- 
lar masses of each strand are 30,000.115 Da and 31,000.115 
Da respectively, and the measured number of dT and dA 
residues are (30,28) and (28,30). If the molecular mass is 
accurate to 100 ppm, there are 7 possible combinations of 
dG+dC possible for each strand. However, if the measured 
molecular mass is accurate to 10 ppm, there are only 2 
combinations of dG+dC, and at 1 ppm accuracy there is only 
one possible base composition for each strand. 

[0069] Signals from the mass spectrometer may be input 
to a maximum-likelihood detection and classification algo- 
rithm such as is widely used in radar signal processing. The 
detection processing uses matched filtering of BCS observed 
in mass-basecount space and allows for detection and sub- 
traction of signatures from known, harmless organisms, and 
for detection of unknown bioagent threats. Comparison of 
newly observed bioagents to known bioagents is also pos- 
sible, for estimation of threat level, by comparing their BCS 
to those of known organisms and to known forms of 
pathogenicity enhancement, such as insertion of antibiotic 
resistance genes or toxin genes. 

[0070] Processing may end with a Bayesian classifier 
using log likelihood ratios developed from the observed 
signals and average background levels. The program empha- 
sizes performance predictions culminating in probability-of- 
detection versus probability-of-false-alarm plots for condi- 
tions involving complex backgrounds of naturally occurring 
organisms and environmental contaminants. Matched filters 
consist of a priori expectations of signal values given the set 
of primers used for each of the bioagents. A genomic 
sequence database (e.g. (icnBank) is used to define tile mass 
basecount matched filters. The database contains known 
threat agents am! benign background organisms. The latter is 
used to estimate and subtract the signature produced by the 
background organisms. A maximum likelihood detection of 
known background organisms is implemented using 
matched filters and a running-sum estimate of the noise 
covariance. Background signal strengths are estimated and 
used along W ill] the matched filters to form signatures w hich 
are then subtracted, the maximum likelihood process is 
applied to this "cleaned up" data in a similar manner 
employing matched filters for the organisms and a running- 
sum estimate of the noise-covariance for the cleaned up data. 

[0071] In one embodiment, a strategy to ■■triangulate" each 
organism by measuring signals from multiple core genes is 
used to reduce false negative and false positive signals, and 
enable reconstruction of the origin or hybrid or otherwise 
engineered bioagents. After identification of multiple core 
genes, alignments are created from nucleic acid sequence 
databases. The alignments are then analyzed for regions of 
conservation and variation, and potential primer binding 
sites flanking variable regions are identified. Next, amplifi- 
cation target regions for signature analysis are selected 
which distinguishes organisms based on specific genomic 
differences (i.e., base composition). For example, detection 
of signatures for the three part toxin genes typical of B. 
anthracis (Bowen, J. E. and C. P. Quinn, /. Appl. Microbiol. 
1999, 87, 270-278) in the absence of the expected signatures 
from the B. anthracis genome would suggest a genetic 
engineering event. 

[0072] The present method can also be used to detect 
single nucleotide polymorphisms (SNPs), or multiple nucle- 
otide polymorphisms, rapidly and accurately. A SNP is 
defined as a single base pair site in the genome that is 
different from one individual to another. The difference can 



be expressed either as a deletion, an insertion or a substi- 
tution, and is frequently linked to a disease state. Because 
they occur every 100-1000 base pairs, SNPs are the most 
frequently bound type of genetic marker in the human 
genome. 

[0073] For example, sickle cell anemia results from an 
A-T transition, which encodes a valine rather than a 
glutamic acid residue. Oligonucleotide primers may be 
designed such that they bind to sequences which flank a SNP 
site, followed by nucleotide amplification and mass deter- 
mination of the amplified product. Because the molecular 
masses of the resulting product from an individual who does 
not have sickle cell anemia is different from that of the 
product from an individual who has the disease, the method 
can be used to distinguish the two individuals. Thus, the 
method can be used to detect any known SNP in an 
individual and thus diagnose or determine increased suscep- 
tibility to a disease or condition. 

[0074] In one embodiment, blood is drawn from an indi- 
vidual and peripheral blood mononuclear cells (PBM( ') are 
isolated and simultaneously tested, preferably in a high- 
throughput screening method, for one or more SNPs using 
appropriate primers based on the known sequences which 
flank the SNP region. The National Center for Biotechnol- 
ogy Information maintains a publicly available database of 
SNPs (http://www.ncbi.nlm.nih.gov/SNP/). 
[0075] The method of the present invention can also be 
used for blood typing. The gene encoding A, B or O blood 
type can differ by four single nucleotide polymorphisms. If 
the gene contains the sequence CGTGGTGACCCTT, anti- 
gen A results. If the gene contains the sequence CGTCGT- 
CACCGCTA antigen B results. If the gene contains the 
sequence CGTGGT-ACCCCTT, blood group O results ("-" 
indicates a deletion). These sequences can be distinguished 
by designing a single primer pair which flanks these regions, 
followed by amplification and mass determination. 

[0076] While the present invention has been described 
with specificity in accordance with certain of its preferred 
embodiments, the following examples serve only to illus- 
trate the invention and are not intended to limit the same. 

EXAMPLE 1 
[0077] Nucleic Acid Isolation and PCR 
[0078] In one embodiment, nucleic acid is isolated from 
the organisms and amplified by PCR using standard methods 
prior to BCS determination by mass spectrometry. Nucleic 
acid is isolated, lor example, by detergent lysis of bacterial 
cells, centrifugation and ethanol precipitation. Nucleic acid 
isolation methods are described in, for example, Current 
Protocols in Molecular Biology (Ausubel et al.) mdMolecu- 
lar Cloning; A Laboratory Manual (Sambrook et al.). The 
nucleic acid is then amplified using standard methodology, 
such as PCR, with primers which bind to conserved regions 
of the nucleic acid which contain an intervening variable 
sequence as described below. 

EXAMPLE 2 

[0079] Mass Spectrometry 

[0080] FTTCR Instrumentation: 

[0081] The FTTCR instrument is based on a 7 tesla 
actively shielded superconducting magnet and modified 
Bruker Daltonics Apex II 70e ion optics and vacuum cham- 



US 2003/0027135 Al 



9 



Feb. 6, 2003 



ber. The spectrometer is interfaced to a LEAP PAL autosam- 
pler and a custom fiuidics control system for high through- 
put screening applications. Samples are analyzed directly 
from 96-well or 384-well microtiter plates at a rate of about 
1 sample/minute. The Bruker data-acquisition platform is 
supplemented with a lab-built ancillary NT datastation 
which controls the autosampler and contains an arbitrary 
waveform generator capable of generating complex rf -excite 
waveforms (frequency sweeps, filtered noise, stored wave- 
form inverse Fourier transform (SWIFT), etc.) for sophisti- 
cated tandem MS experiments. For oligonucleotides in the 
20-30-mer regime typical performance characteristics 
include mass resolving power in excess of 100,000 
(FWHM), low ppm mass measurement errors, and an oper- 
able m/z range between 50 and 5000 m/z. 

[0082] Modified ESI Source: 

[0083] In sample -limited analyses, analyte solutions are 
delivered at 150 nL/minute to a 30 mm i.d. fused-silica ESI 
emitter mounted on a 3-D micromanipulator. The ESI ion 
optics consist of a heated metal capillary, an rf-only hexa- 
pole, a skimmer cone, and an auxiliary gate electrode. The 
6.2 cm rf-only hexapole is comprised of 1 mm diameter rods 
and is operated at a voltage of 380 Vpp at a frequency of 5 
Ml I/,. A lab-built electromechanical shutter can be employed 
to prevent the electrospray plume from entering the inlet 
capillary unless triggered to the "open" position via a TTL 
pulse from the data station. When in the "closed" position, 
a stable electrospray plume is maintained between the ESI 
emitter and the face of the shutter. The back face of the 
shutter arm contains an elastomeric seal which can be 
positioned to form a vacuum seal with the inlet capillary. 
When the seal is removed, a 1 mm gap between the shutter 
blade and the capillary inlet allows constant pressure in the 
external ion reservoir regardless of whether the shutter is in 
the open or closed position. When tile shutter is triggered, a 
■■time slice" of ions is allowed to enter the inlet capillary and 
is subsequently accumulated in the external ion reservoir. 
The rapid response time of the ion shutter (<25 ms) provides 
reproducible, user defined intervals during which ions can 
be injected into and accumulated in the external ion reser- 
voir. 

[0084] Apparatus for Infrared Multiphoton Dissociation 

[0085] A 25 watt CW C0 2 laser operating at 10.6 fim has 
been interfaced to the spectrometer to enable infrared mul- 
tiphoton dissociation (IRMPD) for oligonucleotide sequenc- 



ing and other tandem MS applications. An aluminum optical 
bench is positioned approximately 1.5 m from the actively 
shielded superconducting magnet such that the laser beam is 
aligned with the central axis of the magnet. Using standard 
IR-compatible mirrors and kinematic mirror mounts, the 
unfocused 3 mm laser beam is aligned to traverse directly 
through the 3.5 mm holes in the trapping electrodes of the 
FTICR trapped ion cell and longitudinally traverse the 
hexapole region of the external ion guide finally impinging 
on the skimmer cone. This scheme allows IRMPD to be 
conducted in an m/z selective manner in the trapped ion cell 
(e.g. following a SWIFT isolation of the species of interest), 
or in a broadband mode in the high pressure region of the 
external ion reservoir where collisions with neutral mol- 
ecules stabilize IRMPD-generated metastable fragment ions 
resulting in increased fragment ion yield and sequence 
coverage. 

EXAMPLE 3 
[0086] Identification of Bioagents 

[0087] Table 1 shows a small cross section of a database 
of calculated molecular masses for over 9 primer sets and 
approximately 30 organisms. The primer sets were derived 
from rRNA alignment. Examples of regions from rRNA 
consensus alignments are shown in FIGS. 1A-1C. Lines 
with arrows are examples of regions to which intelligent 
primer pairs for PCR are designed. The primer pairs are 
>95% conserved in the bacterial sequence database (cur- 
rently over 10,000 organisms). The intervening regions are 
variable in length and/or composition, thus providing the 
base composition "signature" (BCS) for each organism. 
Primer pairs were chosen so the total length of the amplified 
region is less than about 80-90 nucleotides. The label for 
each primer pair represents the starting and ending base 
number of the amplified region on the consensus diagram. 
[0088] Included in the short bacterial database cross- 
section in Table 1 are many well known pathogensA'iowar- 
fare agents (shown in bold/red typeface) such as Bacillus 
anlliracis or Yersinia pestis as well as some of the bacterial 
organisms found commonly in the natural environment such 
as Streptomyces. Even closely related organisms can be 
distinguished from each other by the appropriate choice of 
primers. For instance, two low G+C organisms, Bacillus 
anthracis and Staph aureus, can he distinguished from each 
other by using the primer pair defined by 16S_1337 or 
23S_855 (AM of 4 Da). 



i 971 16S IKK) 16S 1557 16S 1 204 I (iS 1 22S 23S 1021 23S S55 25S 193 23S_115 



Campylobacter jejuni 

Clostridium difficile 
Enterococcus faecalis 



55622.1 
56231.2 



543S7/I 

55621.1 
55011 

54386.9 
55007 
53767 

54386.9 
5 43S7. 'i 



28448 

28447.6 
28446.7 
28440.7 
28448 
29061.8 
29063 
28445 



35854.9 
35238 
35854.9 

35852.9 
35854 
35856.9 
35855 
35855 
35S5.V) 
35858.9 



51296 

51296.4 
51307.4 
51295.4 
50683 
50674.3 
50676 
51291 
51296.4 
51296.4 



30299 
311295 

30295 



42654 39557.5 54999 

42651 39560 56850 

42651 39560.5 56850.3 

42653 39559.5 51920.5 

30297 42029.9 38941.4 52524.6 



30294 
30295 
30300 

30294 
30297 



42032.9 39558.5 45732.5 

42036 38941 56230 

42656 39562 54999 

41417.8 39556.5 55612.2 

42652 39559.5 56849.3 



US 2003/0027135 Al 



10 



Feb. 6, 2003 



TABLE 2-continued 



' 



16S_971 16S_1100 16S_1337 16S_1294 16S_1228 23S_1021 23S_855 23S_193 23S_115 



Francisella tularensis 
Haemophilus influenzae 
Klebsiella pneumonia:- 
I ion II i i i 
Mycobacterium avium 
Mycobacterium leprae 
Mycobacterium tuberculosi 
Mycoplasma yeniiuiium 
Mycoplasma pneumoniae 
Neisseria gonorrhoeae 

Kit k, //•;■/ proaaiekii 



Staphylococcus 
Streptomyces 
Treponema pall 



55620.1 
55622.1 
55618 

54389.9 
54300.') 
53143.7 
53143.7 
55627.1 



58094 
55622 
55623 

56854.3 
54389.9 
56245.2 



55629.1 

55620.1 

45115.4 

45118.4 

543X0.0 

55010 

55621 

•S',2J 

551(05 

55009 

543X6.0 



28444.7 
28442.7 
28446 

20064.8 
20064.8 

20061.8 
20061.8 
28445.7 
28443 
28448 



2X443.7 
20063.8 
28445.7 
28443 
28444.7 
28443 



35K5J 
35857 
35S57 



30301 42656 



50671.3 
50673.3 
51302.4 
51301 
50677 
50679 

301 

301 



52536 
50064.2 
51299 



30299 
30294 
30204 
30300 



30298 
30200 



43264.1 
43264.1 
42640 
43272 
42650 



56842.4 
56843.4 

55000 



42034.9 38939.4 



[0089] FIG. 6 shows the use of ESI-FT-ICR MS for 

measurement of exact mass. The spectra from 46mer PCR 
products originating at position 1337 of the 16S rRNAfrom 
5. aureus (upper) andB. anthracis (lower) are shown. These 
data are from the region of the spectrum containing signals 
from llic [M-8II+]' S ~ charge slalcs of the respective 5-3' 
strands. The two strands differ by two (AT-»CG) substitu- 
tions, and have measured masses of 14206.396 and 
14208.373+0.010 Da, respectively. The possible base com- 
positions derived from the masses of the forward and reverse 
strands for the B. anthracis products are listed in Table 3. 

TABLE 3 



Calc. Mass 

14208.2935 
14208.3160 
14208.3386 
14208.3074 
14208.3300 
14208.3525 
14208.3751 
14208.343" 
14208.3665 
14208.3890 

14208.4030 
142.18.4:55 
14208.4481 
14208. 43»5 
14208.46201 
14i!7'i.2i,:4 
1407».:s4<> 
14070.30 75 
14070.25 38 
14070.2764 
1.4070.2080 
14070.3214 



Base Comp. 

Al 017 010 T18 
Al G20 CM 5 T10 
A J. (j 2 3 C20 T2 
A6 Gil C3 T26 
A6 G14 C8 T18 
A6 G17 C13 T10 
A6 G20 C18 T2 
All G8 CI T26 
All Gil C6 T18 
All G14 Cll T10 
All G17 C16 T2 
A16 G8 C4 T18 
A16 Gil C9 T10 
A16 G14 C14T2 
A21 G5 C2 T18 
A21 G8 C7 T10 
AO G14 C13 T19 
AO G17 C18 Til 
AO G20 C23 T3 
A5 G5 CI T35 
A5 G8 C6 T27 
A5 Gil Cll T19 
A5 G14 C16 Til 



10 G5 C4 T27 
10 G8 C9 T19 
10 Gil C14 Til 



A15 G8 C12 Til 
A15 Gil C17 T3 
A20 G2 C5 T19 
A20 G5 C10 T13 



[0090] Among the 16 compositions for the forward strand 
and the 18 compositions for the reverse strand that were 
calculated, only one pair (shown in bold) are complemen- 
tary, corresponding to the actual base compositions of the B. 
anthracis PCR products. 

EXAMPLE 4 

[0091] BCS of Region from Bacillus anthracis and Bacil- 

[0092] A conserved Bacillus region from B. anthracis 
(A 14 G 9 C 14 T 9 ) and B. cereus (A 15 G 9 C 13 T 9 ) having a C to A 
base change was synthesized and subjected to ESI-TOF MS. 
The results are shown in FIG. 7 in which the two regions are 
clearly distinguished using the method of the present inven- 
tion (MW=14072.26 vs. 14096.29). 



US 2003/0027135 Al 



11 



Feb. 6, 2003 



[0093] Identification of Additional Bioagents 

[0094] In other examples of the present i 
pathogen Vibrio cholera can be distinguished from Vibrio 
parahemolyticus with AM>600 Da using one of three 16S 
primer sets shown in Table 2 (16S_971, 16S_1228 or 
16S_1294) as shown in Table 4. The two mycoplasma 
species in the list (A/, genitalium and M. pneumoniae) can 
also be distinguished from each other, as can the three 
mycobacteriae. While the direct mass measurements of 
amplified products can identify and distinguish a large 
number of organisms, measurement of the base composition 
signature provides dramatically enhanced the base compo- 
sition signature provides dramatically enhanced resolving 
power for closely related organisms. In cases such as Bacil- 
lus anthracis and Bacillus cereus that are virtually indistin- 
guishable from each other based solely on mass differences, 
compositional analysis or fragmentation patterns are used to 
resolve the differences. The single base difference between 
the two organisms yields different fragmentation patterns, 
and despite the presence of the ambiguous/unidentified base 
N at position 20 in B. anthracis, the two organisms can be 
identified. 

[0095] Tables 4a-b show examples of primer pairs from 
Table 1 which distinguish pathogens from background. 



23S 855 16S I.H7 23S 11)21 



Base comp. 



Mycobacterium 
Streptomyces sp. 
Ureaplasma urealyticum 



Mycobacterium leprae 



( /.'-.iridium [h-ifringem 



Aeromonas hydrophila 
Escherichia coli 
Pseudomonas putida 



li,-nMlti /'/>«, 7//s, ■;)//(■■/ 



Helicobacter pylori 



A,„C. t .e„,r,„ 
A 17 G 38 C 27 T 14 
A 18 G, r ,C 17 T 17 
A r ,C,,„C ',,r„. 
V C C I j 

a 20 c;„c 21 t„, 

A, 1 (i, 6 C-,-,T I ,> 
A-,fi-, 7 C LQ T LQ 
A-,fi-, 7 C LO T-, L 
A-, 1 G-, 8 C\ 1 T 18 
A-,fi : , n C-,-,T L6 
A 22 G 27 C 20 T 19 
A 22 G 2S C 20 T 1S 
A 22 G 2S C 21 T 17 
A 22 G 29 C 23 T 15 
A 22 G 32 C 20 T 16 
A 23 G 20 C 14 T 16 
A„G, 6 C\ n T 19 
A 23 G 26 C 21 T 18 
A 23 G 26 C 21 T 19 
A 23 G 26 C 24 T 15 
A 23 C 2r ,C 24 T 15 
A 23 G 27 C 19 T 19 

A 2J <; 27 (; 20'"l8 

A 23 G 27 C 20 nT 18 

A 2i (i 27 ('2o'l'lS 

A 23 G 29 C 21 T 16 
A 23 G 29 C 21 T 16 
A 23 G 29 C 21 T 17 
A 23 G 29 C 22 T 15 

A 2l (; 2 ,,( ' 22 r r, 5 

A 23 G 30 C 21 T 16 
A 23 G 31 C 21 T 15 
A 23 G 31 C 21 T 1S 
A 24 G 19 C 12 T 18 
A 24 G 25 C 18 T 20 
A 24 G 26 C 19 T 14 
A 24 G 26 C 19 T 19 
A 24 G 26 C 20 T 18 
A 24 G 26 C 20 T 18 

A 24 G 2( ,('2,;r, s 

A 24 G 26 C 20 T 19 
A 24 G 26 C 21 T 1S 
A 24 G 26 C 23 T 16 
A 24 G 28 C 20 T 17 



[0097] Table 4 shows the expected molecular weight and 
base composition of region 16S_1100-1188 in Mycobacte- 
rium avium and Streptomyces sp. 



Molecular Base 

weight comp. 

25624.1728 A 16 G 32 C 18 T 16 

29904.871 A 17 G 38 C 27 T 14 



C. pneumonia AR39 



. pseudotuberculos 



Borrelia burgdorferi 
Streptobacillus monilifo, 
Rickettsia prowaiekii 
Rickettsia rickettsii 
Mycoplasma mycoides 



A 24 C: 28 (' 2 ,T,„ 
A 24 G 29 C 21 T 16 
A 24 G 30 C 21 T 15 
A 24 G 30 C 21 T 15 
A 24 (i ((1 c: 2 ,T, 5 

Az5G 24 C 18 T 21 
AzsG^CjbTzo 
Az5G 25 C 19 T 19 
Az5G 26 C 20 T 19 
Az5G 27 C 16 T 22 

A 25°27 C 21 T 16 

Az5G 29 C 17 T 19 
A 26 G 26 C 20 T 16 
A 26 G 28 C 18 T 18 
A 26 G 28 C 20 T 16 
A 28 G 23 C 16 T 20 



[0098] Table 5 shows base composition (single strand) 
results for 16S_1100-1188 primer amplification reactions 
different species of bacteria. Species which are repeated in 
the table (e.g., Clostridium hotulinum) are different strains 
which have different base compositions in the 16S_1100- 
1188 region. 



[0099] The same organism having different base compo- 
sitions are different strains. Groups of organisms which are 
highlighted or in italics have the same base compositions in 
the amplified region. Some of these organisms can be 
distinguished using multiple primers. For example, Bacillus 
anthracis can be distinguished from Bacillus cereus and 



US 2003/0027135 Al 



12 



Feb. 6, 2003 



llacilhts tliuringiensis using the primer 16S_971-1062 
( Table 6). Other primer pairs which produce unique base 
composition signatures are shown in Table 6 (bold). C lusters 
containing very similar threat and ubiquitous non-threat 
organisms (e.g. anthracis cluster) are distinguished at high 
resolution with focused sets of primer pairs. The known 
biowarfare agents in Table 6 are Bacillus anthracis, Yersinia 
pe.sli.s, Francisella tularensis and Rickettsia prowazekii. 

TABLE 7 

Organism 16S_971-1062 16S_1228-1310 16S_1100-1188 



hydrophila 

Aeromonas A 21 G 29 C 22 T 20 

Bacillus anthracis A 21 G 27 C 22 T 22 

Bacillus cereus A 22 G 27 C 21 T 22 

Bacillus A 22 G 27 C 21 T 22 

Chlamydia A^G^C^T^ 
trachomatis 

Chlamydia A 26 G 23 C 20 T 22 
AR39 

Leptospira A 22 G 26 C 20 T 21 
borgpetersenii 

Leptospira A 22 G 26 C 20 T 21 

Mycoplasma A 28 G 23 C 15 T 22 
genitalium 

Mycoplasma A 28 G 23 C 15 T 22 
pneumoniae 



a 2 .,g 2 „c 22 t 2 , 

A^C^ 



A 21 G 2(i C 24 T 2 5 
A 21 G 26 C 25 T 24 



A 22 G 27 C 21 T 13 

A 22 G 27 C 21 T 13 

A 24 G 22 C 19 T 18 
A 24 G 22 C 19 T 18 
A 24 G 22 C 19 T 18 

A 24 G 23 C 19 T 16 

A, 6 G,,C 16 T 18 

A^G^C^T^ 

A 3 0 G 18 C 15 T 19 

A 27 G 19 C 16 T 20 



A 2( ,G 24 C,.;r 14 

A,,<i,.,C\,„T,, 
A . S G . ,C .,,'1', , 



A 23 G 26 C 17 T 17 
A 24 G 23 C 16 T 19 
A 24 G 24 C 17 T 17 



A 23 G 31 C 21 T 15 

A 23 G 31 C 21 T 15 

A^G^T^ 
A 23 G 27 C 20 T 18 
A 23 G 27 C 20 T 18 

A, 4 G, 8 C 21 T 16 

A2 4 G 2 sC 2 iT 16 

A 23 G 26 C 24 T 15 
A 2 3G 26 C 24 T 15 
A 24 G l9 C l2 T l8 

A 2i c; 20 c 14 T lf , 



A,,(.i„,C\,T, s 
A,,G, n C,,T 1: , 



A 24 G 26 C 19 T I9 
A 26 G 28 C 18 T 18 
A 26 G 28 C 20 T 16 



[0100] The sequence of B. anthracis and B. cereus in 
region 16S_971 is shown below. Shown in bold is the single 
base difference between the two species which can be 
detected using the methods of the present invention. B. 
anthracis has an ambiguous base at position 20. 



standard was a 20-mer phosphorothioate oligonucleotide 
added to a solution containing a 56-mer PCR product from 
the B. anthracis spore coat protein sspE. The mass of the 
expected PCR product distinguishes B. anthracis from other 
species of Bacillus such as B. thuringiensis and B. cereus. 

EXAMPLE 7 

[0103] B. anthracis ESI-TOF Synthetic 16S_1228 
Duplex 

[0104] An ESI-TOF MS spectrum was obtained from an 
aqueous solution containing 5 fiM each of synthetic analogs 
of the expected forward and reverse PCR products from the 
nucleotide 1228 region of the B. anthracis 16S rRNAgene. 
The results (FIG. 9) show that the molecular weights of the 
forward and reverse strands can be accurately determined 
and easily distinguish the two strands. The [M-21H + ] 21 " and 
[M-20H + ] 20 " charge states are shown. 

EXAMPLE 8 

[0105] ESI-FTICR-MS of Synthetic B. anthracis 16S_ 
1337 46 Base Pair Duplex 

[0106] An ESI-FTICR-MS spectrum was obtained from 
an aqueous solution containing 5 fiM each of synthetic 
analogs of the expected forward and reverse PCR products 
from the nucleotide 1337 region of the II. anthracis 1 6S 
rRNA gene. The results (FIG. 10) show that the molecular 
weights of the strands can lie distinguished by this method. 
The [M-16H*] 16 " through [M-10H + ] 10 - charge states are 
shown. The insert highlights the resolution that can be 
realized on the FTICR-MS instrument, which allows the 
charge state of the ion to be determined from the mass 
difference between peaks differing by a single 13C substi- 

EXAMPLE 9 

[0107] ESI-TOF MS of 56-mer Oligonucleotide from 
saspB Gene of/}, anthracis with Internal Mass Standard 

[0108] I si MS spectra were obtained on a synthetic 
56-mer oligonucleotide (5 /<M )from the saspB gene of B. 
anthracis containing an internal mass standard at an ESI of 
1.7/tL/min as a function of sample consumption. The results 
(FIG. 11) show that the signal to noise is improved as more 
scans are summed, and that the standard and the product are 
visible after only 100 scans. 

EXAMPLE 10 
[0109] ESI-TOF MS of an Internal Standard with Tribu- 
tylammonium (TBA)-Trifluoroacetate (TFA) Buffer 



B. cereus_16S_971 



EXAMPLE 6 

[0101] ESI-TOF MS of sspE 56-mer Plus Calibrant 

[0102] The mass measurement accuracy that can be 
obtained using an internal mass standard in the ESI-MS 
study of PCR products is shown in FIG. 8. The mass 



[0110] An ESI-TOF-MS spectrum of a 20-mer phospho- 
rothioate mass standard was obtained following addition of 
5 mM TBA-TFA buffer to the solution. This buffer strips 
charge from the oligonucleotide and shifts the most abun- 
dant charge state from [M-8H + ] 8 " to [M-3H + ] 3 " (FIG. 12). 



US 2003/0027135 Al 



13 



Feb. 6, 2003 



What is claimed is: 

1. A method of identifying an unknown bioagent com- 
prising: 

(a) contacting nucleic acid from said bioagent with at least 
one pair of oligonucleotide primers which hybridize to 
sequences of said nucleic acid, wherein said sequences 
flank a variable nucleic acid sequence of the bioagent; 

(b) amplifying said variable nucleic acid sequence to 
produce an amplification product; 

(c) determining the molecular mass of said amplification 
product; and 

(d) comparing said molecular mass to one or more 
molecular masses of amplification products obtained 
by performing steps (a)-(c) on a plurality of known 
organisms, wherein a match identifies said unknown 
bioagent. 

2. The method of claim 1, wherein said sequences to 
which said at least one pair of oligonucleotide primers 
hybridize are highly conserved. 

3. The method of claim 1, wherein said amplifying step 
comprises polymerase chain reaction. 

4. The method of claim 1, wherein said amplifying step 
comprises ligase chain reaction or strand displacement 
amplification. 

5. The method of claim 1, wherein said bioagent is a 
bacterium, virus, cell or spore. 

6. The method of claim 1, wherein said nucleic acid is 
ribosomal RNA. 

7. The method of claim 1, wherein said nucleic acid 
encodes RNase P or an RNA-dependent RNA polymerase. 

8. The method of claim 1, wherein said amplification 
product is ionized prior to molecular mass determination. 

9. The method of claim 1, further comprising the step of 
isolating nucleic acid from said bioagent prior to contacting 
said nucleic acid with said at least one pair of oligonucle- 
otide primers. 

10. The method of claim 1, further comprising the step of 
performing steps (a)-(d) u sing a different oligonucleotide 
primer pair and comparing the results to one or more 
molecular mass amplification product obtained by perform- 
ing steps (a)-(c) on a different plurality of known organisms 
from those in step (d). 

11. The method of claim 1, wherein said one or more 
molecular masses are contained in a database of molecular 

12. The method of claim 1. wherein said amplification 
product is ionized by electrospray ionization, matrix-as- 
sisted laser desorption or fast atom bombardment. 

13. The method of claim 1, wherein said molecular mass 
is determined by mass spectrometry. 

14. The method of claim 11, wherein said mass spectrom- 
etry is selected from the group consisting of Fourier trans- 
form ion cyclotron resonance mass spectrometry (FT-ICR- 
MS), ion trap, quadrupole, magnetic sector, time of flight 
(TOF), Q-TOF and triple quadrupole. 

15. The method of claim 1, further comprising performing 
step (b) in the presence of an analog of adenine, thymidine, 
guanosine or cytidine having a different molecular weight 
than adenosine, thymidine, guanosine or cytidine. 

16. The method of claim 1, wherein said oligonucleotide 
primer comprises a base analog at positions 1 and 2 of each 



triplet within said primer, wherein said base analog binds 
with increased affinity to its complement compared to the 
native base. 

17. The method of claim 16, wherein said primer com- 
prises a universal base at position 3 of each triplet within 
said primer. 

18. The method of claim 16, wherein said base analog is 
selected from the group consisting of 2,6-diaminopurine, 
propyne T, propyne G, phenoxazines and G-clamp. 

19. The method of claim 16, wherein said universal base 
is selected from the group consisting of inosine, guanidine 
uridine, 5-nitroindole, 3-nitropyrrole, dP, dK, and l-(2- 
deoxy-f5-D-ribofuranosyl)-imidazole-4-carboxamide. 

20. A method of identifying an unknown bioagent com- 
prising: 

contacting nucleic acid from said bioagent with at least 
one pair of oligonucleotide primers which hybridize to 
sequences of said nucleic acid, wherein said sequences 
flank a variable nucleic acid sequence; 

amplifying said variable nucleic acid sequence to produce 
an amplification product; 

determining the base composition of said amplification 
product; and 

comparing said base composition to one or more base 
compositions of amplification products obtained by 
performing steps (a)-(c) on a plurality of known organ- 
isms, wherein a match identifies said unknown bioag- 

21. The method of claim 20, wherein said sequences to 
which said at least one pair of oligonucleotide primers 
hybridize are highly conserved. 

22. The method of claim 20, wherein said amplifying step 
comprises polymerase chain reaction. 

23. The method of claim 20. w herein said amplifying step 
comprises ligase chain reaction or strand displacement 
amplification. 

24. The method of claim 20, wherein said bioagent is a 
bacterium, virus, cell or spore. 

25. The method of claim 20, wherein said nucleic acid is 
ribosomal RNA. 

26. The method of claim 20, wherein said nucleic acid 
encodes RNase P or an RNA-dependent RNA polymerase. 

27. The method of claim 20, wherein said amplification 
product is ionized prior to base composition determination. 

28. The method of claim 20, further comprising the step 
of isolating nucleic acid from said bioagent prior to con- 
tacting said nucleic acid with said at least one pair of 
oligonucleotide primers. 

29. The method of claim 20, further comprising the step 
of performing steps (a)-(d) using a different oligonucleotide 
primer pair and comparing the results to one or more base 
compositions of amplification product obtained by perform- 
ing steps (a)-(c) on a different plurality of known organisms 
from those in step (d). 

30. The method of claim 20, wherein said one or more 
base composition signatures are contained in a database of 
base composition signatures. 

31. The method of claim 20, wherein said amplification 
product is ionized by electrospray ionization, matrix-as- 
sisted laser desorption or fast atom bombardment. 

32. The method of claim 20, wherein said base compo- 
sition signature is determined by mass spectrometry. 



US 2003/0027135 Al 



14 



Feb. 6, 2003 



33. The method of claim 32, wherein said mass spectrom- 
etry is selected from the group consisting of Fourier trans- 
form ion cyclotron resonance mass spectrometry (FT-ICR- 
MS), ion trap, quadrupole, magnetic sector, time of flight 
(TOF), q-TOF and triple quadrupole. 

34. The method of claim 20, further comprising perform- 
ing step (b) in the presence of an analog of adenine, 
thymidine, guanosine or cytidine having a different molecu- 
lar weight than adenosine, thymidine, guanosine or cytidine. 

35. The method of claim 20, wherein said oligonucleotide 
primer comprises a base analog at positions 1 and 2 of each 
triplet within said primer, wherein said base analog binds 
with increased affinity to its complement compared to the 
native base. 

36. The method of claim 35, wherein said primer com- 
prises a universal base at position 3 of each triplet within 
said primer. 

37. The method of claim 35, wherein said base analog is 
selected from the group consisting of 2,6-diaminopurine, 
propyne T, propyne G, phenoxazines and G-clamp. 

38. The method of claim 36, wherein said universal base 
is selected from the group consisting of inosine, guanidine 
uridine, 5-nitroindole, 3-nitropyrrole, dP, dK, and l-(2- 
deoxy-P-D-ribofuranosyl)-imidazole-4-carboxamide. 

39. A method for detecting a single nucleotide polymor- 
phism in an individual, comprising the steps of: 

isolating nucleic acid from said individual; 

contacting said nucleic acid with oligonucleotide primers 
which hybridize to regions of said nucleic acid which 
Hank a region comprising said potential polymorphism; 

amplifying said region to produce an amplification prod- 



determining the molecular mass of said amplification 
product; 

comparing said molecular mass to the molecular mass of 
said region in an individual known to have said poly- 
morphism, wherein if said molecular masses are the 
same then said individual has said polymorphism. 

40. The method of claim 39, wherein said polymorphism 
is associated with a disease. 

41. The method of claim 39, wherein said polymorphism 
is a blood group antigen. 

42. The method of claim 39, wherein said amplification 
step is the polymerase chain reaction. 

43. The method of claim 39, wherein said amplification 
step is ligase chain reaction or strand displacement ampli- 

44. The method of claim 39, wherein said amplification 
product is ionized prior to mass determination. 

45. The method of claim 39, wherein said amplification 
product is ionized by electrospray ionization, matrix-as- 
sisted laser desorption or fast atom bombardment. 

46. The method of claim 39, wherein said primers hybrid- 
ize to conserved sequences. 

47. The method of claim 39, wherein said molecular mass 
is determined by mass spectrometry. 

48. The method of claim 47, wherein said mass spectrom- 
etry is selected from the group consisting of Fourier trans- 
form ion cyclotron resonance mass spectrometry (FT-ICR- 
MS), ion trap, quadrupole, magnetic sector, time of flight 
(TOF), Q-TOF and triple quadrupole. 



